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Metabolism is a fascinating cell machinery underlying life and disease and genome-scale recon- 
structions provide us with a captivating view of its complexity. However, deciphering the relationship 
between metabolic structure and function remains a major challenge. In particular, turning observed 
structural regularities into organizing principles underlying systemic functions is a crucial task that 
can be significantly addressed after endowing complex network representations of metabolism with 
the notion of geometric distance. Here, we design a cartographic map of metabolic networks by 
embedding them into a simple geometry that provides a natural explanation for their observed 
network topology and that codifies node proximity as a measure of hidden structural similarities. 
We assume a simple and general connectivity law that gives more probability of interaction to 
metabolite/reaction pairs which are closer in the hidden space. Remarkably, we find an astonishing 
congruency between the architecture of E. coli and human cell metabolisms and the underlying 
geometry. In addition, the formalism unveils a backbone-like structure of connected biochemical 
pathways on the basis of a quantitative cross-talk. Pathways thus acquire a new perspective which 
challenges their classical view as self-contained functional units. 



Cells are self-organized entities that carry-out spe- 
cialized tasks at different interrelated omic-levels [1] in- 
volving different actors, from codifying genes to energy- 
carrier or constitutive metabolites. A key towards under- 
standing this complex architecture at a systems level is 
provided by reliable genome-wide reconstructions of the 
set of biochemical reactions that underly the functional 
cell machinery [Jj . Such reconstructions can be analyzed 
using tools and techniques from complex networks the- 
ory [3H5] , a discipline that is being used in the characteri- 
zation of biological, chemical, infrastructural, technologi- 
cal or social-based systems of complex relationships [HIT]- 
More precisely, nodes in metabolic networks account for 
either metabolites or reactions, while links represent the 
interactions among them. Apart from providing a large- 
scale organizational picture, these network-based rep- 
resentations have permitted to analyze sensible issues 
in cellular metabolism, like flux balances [H |2] , regula- 
tion [TD], robustness [TT], or reaction reliability |12j . 

The advantage of using network-based representations, 
in whatever context we employ them, may be arguably 
questioned by the fact that complex networks are cus- 
tomarily modeled as pure topological constructions lack- 
ing a true geometric measure of separation among nodes. 
This is aggravated by the fact that complex networks 
have the small- world property |13| . meaning that every 
pair of nodes in the system are very close in topological 
distance. This is an important and obvious degeneracy 
if we think in terms of optimizing routing or transporta- 
tion strategies in man-engineered networks, but can be 
equally crucial when referring to the description of the 
metabolic functioning at a single cell level. As a mat- 
ter of fact, the related attempt of separating nodes into 
communities, that has been already pursued in different 
contexts ^14, and, in particular, applied to metabolic net- 
works [H], has proven to be an extremely difficult task. 
Classical community detection approaches turn out to be 



a posteriori classification methods, and do not provide 
insights into any potential connectivity law underlying 
the observed topology. These questions could be signifi- 
cantly addressed by quantifying the abstract concept of 
node proximity in terms of a metric distance which could 
be combined into a simple and general probabilistic con- 
nectivity law. Such a biochemical connectivity law, re- 
lying on metric distances, may provide a simple expla- 
nation of the large scale topological structure observed 
in metabolism [16 , and it can also be used, like in this 
work, to revisit the concept of biochemical pathways. 

In this paper, we uncover the hidden geometry of the 
E. coli and human metabolisms and find that their net- 
work topologies obey an extremely simple and powerful 
-metric-based- probabilistic connectivity law. In par- 
ticular, given a pair metabolite/reaction separated by a 
geometric distance dmr in the underlying metric space, 
the probability of existence of a connection between them 
is here shown to be a decreasing function of the effective 
distance deff = dmr/{krkm), where degrees km and kr 
count the number of their respective neighboring nodes. 
The geometric distance dmr ~a measure of structural 
affinity between metabolites and reactions- is in this way 
modulated by the product of degrees of the two involved 
nodes, so that the degree heterogeneity observed in the 
metabolic network is properly taken into account. Natu- 
rally, a key ingredient in our approach concerns the suit- 
able geometry substantiating this distance. We find that 
a simple one dimensional closed Euclidean space, i.e. a 
circle, when combined with the network degree hetero- 
geneity is enough to capture the global organization of 
the network. Using statistical inference techniques, we 
find angle-based coordinates in this space for the full set 
of metabolites and reactions, which expose the extraor- 
dinary congruency of our model. 

As a direct application of the proposed cartographic 
maps of metabolism, we compare the results of our 
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embedding with the standard classification of reactions 
in terms of biochemical pathways. Such a reaction- 
aggregated analysis reveals rather disparate trends when 
pathways are characterized in terms of the circle-based lo- 
calizations of their constituents reactions. Some specific 
pathways appear concentrated over narrow sectors of po- 
lar angles, while more transversal ones are widespread 
over the circle. This points to a diversity of path- 
way topologies, with some of them displaying groups of 
densely interconnected reactions while some others evi- 
dencing a much more weakly connected internal struc- 
ture. Moreover, pathways themselves admit to be linked 
using the discovered connectivity law. This strategy re- 
veals different levels of cross-talk between pathways, lead- 
ing to a coarse-grained view of metabolic networks or, in 
other words, to the build-up of networks of pathways. 
Such a higher level in the hierarchical organization of 
metabolic networks advises against the study of path- 
ways as autonomous subsystems and should permit to 
calibrate more accurately how a pathway-localized per- 
turbation spreads over the entire network. 



I. RESULTS 

A. Embedding algorithm and validation 

A simple abstraction of a given metabolism is given 
by its bipartite network representation. This amounts to 
consider metabolites and reactions as belonging to dif- 
ferent subsets of nodes, with metabolites (irrespectively 
considered as reactants and products) linked to all reac- 
tions they take part in, and thus avoiding connections 
between nodes of the same kind, see Fig. [l^. The first 
step towards mapping this network consists in defining 
a geometric model that can advantageously represent it. 
The simplest realization of a one-dimensional homoge- 
neous and isotropic closed metric space that can glob- 
ally embed a network is a circle of radius R. Nodes, in 
our case metabolites and reactions separately, are dis- 
tributed on it according to specific angular coordinates 
to be determined. The whole strategy to find these co- 
ordinates rests on a precise definition of the interactions 
between nodes in terms of their ring-based distances. We 
prescribe a connection probability between a reaction r 
and a metabolite m, with respective bipartite degrees kr 
and km and separated by a distance dmr on the circle 
{dmr — R^Omri ^^mr being the angular separation be- 
tween metabolite and reaction) to be a decreasing func- 
tion of such distance rescaled by the product of node 
degrees [17], 



Prob{m is connected to r} = p 



(1) 



It is worth-stressing that this is the central and unique 
law underlying the whole formalism. Notice that this 
choice is particularly suggestive since by identifying the 



node degree as a measure of its mass, this interaction 
mimics the Newtonian form of gravitational interaction. 
More precisely, the explicit form for the above interaction 
reads 



P 



km.kr 



1 



1 -t- {dmr / l-tkmkr)^ 



(2) 



This particular prescription combines, in a simple way, 
the classical network topological concept of node degrees 
with the newly introduced notion of geometric distance. 
All in all, this functional form expresses an intuitive view, 
i.e. closer nodes in the metric space are more likely to 
be linked, while nodes with higher degrees sustain far- 
ther reaching connections regardless of their distances. 
Figure [T]d shows a visual sketch summarizing the basic 
trends of the bipartite formalism just outlined. We refer 
to it with the notation x sec Methods. Besides, 
this model gives rise to a maximum-entropy ensemble of 
graphs that are therefore maximally random given their 
specific constraints |,18, , 19) . Finally, parameters fj, and /3 
are consistently determined to reproduce the statistical 
properties of the original network. Parameter ^ fixes the 
total number of edges, whereas /3 controls clustering, i.e, 
a measure of short-range loops, see Appendix B. 

To infer the angle-based coordinates for metabolites 
and reactions in the ring we use a two-step procedure. 
Starting from the original bipartite network, we first per- 
form a one-mode projection over the set of metabolites 
by connecting two metabolites whenever they participate 
in the same reaction. We then circle-embed such a uni- 
partite metabolites network applying the unipartite ver- 
sion of the formalism as described earlier [2D]. Finally, 
using this partial allocation as an initial fixed template, 
we complete the embedding of the reactions by invoking 
a maximum likelihood inference strategy (the detailed 
description of the embedding algorithm and the coordi- 
nates of metabolites and reactions are fully reported in 
Appendix C). 

We apply our formalism to the iAF1260 version of the 
K12 MG1655 strain of E. coli metabolism [21] and to 
human cell metabolism [22 , both provided in the BiGG 
database 23, 24 , see Appendix B. Before presenting the 
embedding for these metabolic networks, we comment 
on the validation of the proposed mapping procedure. 
We first perform a direct calibration which amounts to 
compare the set of observed metabolite-reaction connec- 
tion probabilities in the original reconstructions with the 
theoretical connection probability given by Eq. ( B9 ) . Ex- 
plicit results are presented in Fig.jl]:;, both for E. coli and 
human metabolisms. Beyond the striking agreement be- 
tween observed and predicted connections, it is worth 
noticing that the two analyzed networks are perfectly 
represented with the same /3 exponent fitted to a value 
/3 = 1.3. We also check the discrimination power of our 
algorithm by computing the Receiver Operating Charac- 
teristics (ROC) curve of our model [25], which compares 
the true positive rate (TPR) vs. the false positive rate 
(FPR) and informs about how good is our method at 
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FIG. 1: Model and empirical validation, a, Bipartite 
network representation of four coupled stoichiometric equa- 
tions in the pentose-phosphate pathway of E. coli. Reaction 
acronyms stand for the catalyzing enzyme: zwf, glucose- 6- 
phosphate dehydrogenase [EC 1.1.1. 49] ; pgl, 6- phospho- 
gluconolactonase [EC 3. 1. 1.31] ; gnd, 6- phosphoglu- 
conate dehydrogenase [EC 1.1.1. 43] ; rpe, ribulose- phos- 
phate 3- epimerase [EC 5. 1.3. 1]. Notice that connections 
(black lines) are always between reactions (yellow circles) and 
metabolites (blue squares), metabolites or reactions are never 
connected among themselves, b, A sketch of the x 
model. Nodes are randomly distributed in the circle and given 
expected degrees, symbolically represented by the sizes of the 
nodes. The distance between two nodes is computed as the 
length of the arc separating the nodes. Du e to the peculiar 
rescaling of distances by degrees in Eq. ( B8 1, a node can con- 



nect not only to nearby nodes but also to far apart nodes 
with large degree, c. The plot shows a comparison between 
the empirical connection probability for the E. coli and human 
metabolisms and the theoretical one given in Eq. (B9l. The 



empirical connection probability is computed as the fraction 
between the number of actual connections at effective dis- 
tance dmr/l^kmkr aud the total number of pairs at the same 
effective distance, d. The Receiver Operating Characteris- 
tic (ROC) curve computed for our model for the E. coli and 
human metabolisms is shown. To calculate the ROC curves, 
we rank (from highest to lowest) the connection probabilities 
given by the model for all possible pairs metabolite/reaction 
(either present or absent) using the previously inferred coor- 
dinates. We then define at each value a threshold probability 
that allows us to discriminate between positive interactions 
(those above the threshold) from negative ones (those below 
the threshold) and to compute the fraction of true positive 
connections (True Positive Rate TPR) and that of false pos- 
itive connections (False Positive Rate FPR), with the under- 
standing that a true positive connection is an observed link 
above the threshold, while a false positive is an non-existing 
one above the threshold. 



correctly discern real links. Results are shown in Fig. [TJi. 
When representing the TPR in front of the FPR, a to- 
tally random guess would result in a straight line along 



the diagonal. In contrast, the ROC curve of our model 
lies far above the diagonal, which indicates a remarkable 
discrimination power. A convenient summary statistic 
can be defined as the area under the ROC curve (AUC 
statistic), which represents the probability that a ran- 
domly chosen observed link in the network has a higher 
probability of existence according to the model than a 
randomly chosen non-existing one. This statistic ranges 
in the interval [0.5, 1], being AUC = 0.5 a random pre- 
diction and AUC = 1 a perfect prediction. In our case, 
values are AUC = 0.96 for E. coli and AUC = 0.97 for 
human metabolism. Both validation tests confirm that 
our model adjusts nearly perfectly to the real data. 

Figure [2] shows the embedding representation of the E. 
coli metabolism (the mapping of the human metabolism 
is provided in Appendix F). For the sake of clarity, 
metabolites are displaced towards the center of the circle 
by an amount proportional to their degree so that hub 
metabolites are close to the center of the disk whereas 
low degree ones are placed in the periphery. The dis- 
tribution over the circle is far from being uniform as 
it could be naively expected. Indeed, this is a distinc- 
tive signature of the delicate structural organization of 
metabolic networks. In particular, different levels of ag- 
gregation are readily visible, inasmuch as human settle- 
ments are unevenly distributed in population maps. Si- 
multaneously with densely occupied areas, empty regions 
are visible and appear irregularly punctuated with occa- 
sional metabolite-reaction associations. As a whole, this 
landscape is an indication of some hierarchical trends ex- 
isting in the analyzed networks and prompts us to look 
for eventual higher organizational levels. In this regard, 
we revise the biochemical concept of pathways, classi- 
cally understood as chains of step-by-step reactions which 
transform a principal chemical into another either for im- 
mediate use, to propagate metabolic fluxes or for cell 
storage. In Fig. [2] we identify pathways in the circle by 
plotting their names at the average angular position of 
all their constitutive reactions. 



B. Pathway localization 

In Figs. [3] and |4j we propose two complementary rep- 
resentations of the metabolic pathways of E. coli as they 
appear annotated in the BiGG database. In Fig. [3| we 
show the angular distribution on the ring of the whole 
list of pathways (up to 33, plus an Unassigned category 
of reactions not represented in the figure) , evaluated from 
the circle-based embedding of the reactions they involve. 
We recognize rather disparate spectra of angular distribu- 
tions. Strongly localized pathways, e. g. the Folate path- 
way or Oxidative Phosphorylation, coexist with more dis- 
tributed ones. The latter can adopt either a discrete bi- 
modal, a multi-peaked form, e. g. the Histidine and 
Glycolisis pathways respectively, or can even transver- 
sally spread over the ring closer to a homogeneous dis- 
tribution. The Alternate Carbon, the Transport Inner 
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Tyrosirie. 




FIG. 2: Global geometric map of E. coli's metabolism. Angular distribution of reactions and metabolites inferred by 
the method. Yellow circles represent reactions whereas blue squares are metabolites. For each metabolite, the symbol size 
is proportional to the logarithm of the degree and radially placed according to the expression r = R — 2\n km- Black (grey) 
connections are those that according to the model have a probability of existence larger (smaller) than 0.5. The names of 
the different pathways, radially-written, are located at the average angular position of all the reactions belonging to a given 
pathway, and the font size is proportional to the logarithm of the number of reactions in the pathway. Notice that we do not 
represent transversal pathways and that some pathways seem to be located in empty regions (e.g. Inorganic Ion Transport). 
This is due to the fact that some pathways display bimodal or multi-peaked distributions so that the average appears in between 
the peaks, see Fig. [3] and Table I in Appendix D. 



Membrane, or the Cofactor and Prosthetic Group path- 
ways are representative examples of this latter category 
(see Table I in Appendix D for further details). Our 
method is, therefore, able to discriminate concentrated 
pathways, consistent with the classical view of modular 
subsystems, from others which are indeed formed of sub- 
units, and even from those finally responsible of produc- 
ing or consuming metabolites in turn extensively used by 
many other pathways. 

The embedding of reactions and metabolites in the cir- 
cle can also be used to aggregate pathways into broader 



categories. To do so, the embedding circle is first divided 
into eight dilTerent angular sectors delimited by void re- 
gions in the ranked distribution of reaction angles, see 
Fig. [4^. The pathway concentration, i.e. the fraction 
of reactions of that pathway in each sector, is shown 
in Fig. |4]D-i. Clearly, there are sectors monopolized by 
one or at most two pathways -e. g., Murein in Sector 
7, Fig. |4|r-, whereas other sectors are largely shared by 
many pathways -e.g. , different Aminoacid-based path- 
ways in Sector 4, Fig. |4]3. In all cases, the higher concen- 
trations in each sector mostly correspond to pathways 
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FIG. 3: Angular distribution of biological pathways in 

E. coli. The whole angular domain [0, 360°] is divided in 50 
bins of 7, 2° each and for each bin we compute the fraction 
of reactions of the pathway in it. Each pathway is shown in 
a different graph. Different colors indicate different general 
metabolic classes: red for Amino Acids metabolism (number- 
ing the graphs from left to right and from top to bottom, 1- 
10), orange for metabolism of Cofactors and Vitamins (11-12), 
violet for Nucleotide metabolism (13-14), magenta for tRNA 
charging (15), turquoise for Carbohydrate metabolism (16- 
22), grey for Alternate Carbon metabolism (23), blue for En- 
ergy metabolism (24,27), green for Transport pathways (25- 
26), brown for Glycan metabolism (28-30), and maroon for 
Lipid metabolism (31-33). Pathway names have been abbre- 
viated in standard forms whenever possible. 
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FIG. 4: Sector modules for E. coil's metabolism. Reac- 
tions in related functional categories are observed to aggregate 
in specific regions of the circle. The whole angular domain is 
divided into eight different angular sectors delimited by void 
regions in the ranked distribution of reaction angles. This 
distribution and the angular coordinates defining the sectors 
are given in the left upper graph of the panel. Each sec- 
tor is indicated in a different color. The remaining graphs 
show the pathway concentration, the fraction of reactions of 
that pathway, in each sector. The higher concentrations in 
each sector mostly correspond to pathways in related func- 
tional categories: SI and S2 aggregate pathways related to 
Cell Membrane metabolism (plots b and c), S3 concentrates 
Central metabolism including Energy and part of the Nu- 
cleotide metabolism (plot d), S4 gathers Central metabolism 
including Amino Acids metabolism (plot e), S5 condenses the 
remaining Nucleotide metabolism (plot f) and S6 and S7 ac- 
count for Glycan metabolism (plots g and h), with S6 mix- 
ing basically mono and polysaccharide related pathways and 
pathways related to murein, a polymer that forms the cell 
wall, well separated in S7. Pathway names have been abbre- 
viated in standard forms whenever possible. 



in related functional categories: Sector 1 and Sector 2 
in Fig. |4]3-c aggregate pathways related to Cell Mem- 
brane metabolism. Sector 3 and Sector 4 in Fig. |4[i-e 
concentrate Central metabolism, with Sector 3 includ- 
ing Energy and part of the Nucleotide metabolism and 
Sector 4 including Amino Acid metabolism, Sector 5 in 
Fig. |4f condenses the remaining Nucleotide metabolism, 
and Sector 6 and Sector 7 in Fig. |4^-h account for the 
Glycan metabolism, with Sector 6 mixing basically mono 
and polysaccharide related pathways, while the pathways 
related to murein, a polymer that forms the cell wall, ap- 
pearing well separated in Sector 7. 

Corresponding representations for human metabolism 
are shown in Fig. 10. The number of pathways is con- 
siderably larger but common features to E. coli pathway 



localization patterns are evidenced in qualitative terms. 
Pathways can be divided again into different categories 
according to their angular concentration, with the dif- 
ference that the general level of pathway localization in 
human metabolism is higher than in E. coli. The average 
angular concentration of pathways in human metabolism 
is 0.82, as compared to 0.79 in E. coli (see Methods) and 
the average size of maximum peaks in the pathways an- 
gular distributions in 0.36 for E. coli while for human 
metabolism it is 0.50. However, the higher level of local- 
ization seems to coexist with a higher entanglement of 
the different families of metabolic reactions, i.e carbon 
metabolism, lipid metabolism, etc.. Another observation 
is that transversal pathways in E. coli, like Cofactor and 
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Prosthetic group or Transport, are split into a number 
of more specialized pathways in human metabolism and, 
in fact, the category of transversal pathways itself, as 
defined in E. coli, is here minimally represented. 



C. Cross-talk between pathways 




FIG. 5: Metabolic backbones displaying pathway's 
cross-talks inferred from the model, a. Metabolic back- 
bone for E. coli metabolism at the 0.064 confidence level with 
30% of the original total weight, 91% of the original number 
of pathways, and 9% of the original links, b. Metabolic back- 
bone for human cells at the 0.022 confidence level, with 20% of 
the original total weight, 69% of the original number of path- 
ways, and 5% of the original links. Different colors indicate 
different metabolic families as defined in the caption of Fig.jS] 
The area of a circle representing a pathway is proportional to 
its size in number of reactions. The weights in the connections 
are proportional to the intensity of the cross-talk between the 
pathways. Pathway names have been abbreviated in standard 
forms whenever possible. 

In the first part of the paper, the §^ x §^ formalism was 
applied to embed the observed metabolic networks into 
a circle, enabling to locate the reactions and metabo- 
lites related to every specific pathway in a simple one- 
dimensional geometry. This information can be used to 
build a higher hierarchical level in the architecture of 
the metabolic network aimed at quantifying the inter- 



connectivity between pathways. In turn this allows us to 
introduce the concept of network of pathways. 

Adjacencies between a pair of pathways are computed 
on the basis of the corresponding lists of reactions in each 
pathway and the set of metabolites shared by both lists. 
When the set of overlapping metabolites is not empty, 
the connection probabilities for the links between path- 
way reactions and common metabolites that correspond 
to observed interactions in the network are summed to 
give an absolute measure of the strenght of the interac- 
tion between the pair of pathways. Overlaps between 
pathways pairs assemble a higher order weighted net- 
work where pathways are nodes and links display het- 
erogeneous intensities. However, the resulting network is 
very dense and needs to be conveniently filtered in or- 
der to provide meaningful information about the system. 
In E. coli, 460 out of a potential total of 561 pathways 
pairs overlap while for human cells 1689 pathways pairs 
out of 4278 have common metabolites. In practice we 
use a disparity-based threshold [26] (see Methods) that 
discards links whose intensities are compatible with ran- 
dom fluctuations at some specific significance level. As a 
result these pathway-based networks provide metabolic 
backbones i.e., subnetworks of pathways which display 
the statistically relevant interactions. 

As an illustration of the power of the metabolic back- 
bone concept, panels in Fig. 5 reproduce the correspond- 
ing constructions for E. coli and human metabolisms. In- 
terestingly, metabolic backbones offer a perspective that 
reveals functional constraints. Both for E. coli and hu- 
man metabolism, star-like patterns are particularly neat. 
In E. coli, transversal pathways act as hub-like struc- 
tures that interconnect different number of specific and 
more localized pathways, usually belonging to the same 
metabolic family. For instance, the Cofactor and Pros- 
thetic Group Biosynthesis pathway connects many of the 
amino acid pathways to energy or nucleotide metabolism, 
and Alternate Carbon acts as the main intermediary of 
many Carbohydrate pathways with the rest of the back- 
bone. Analogously, some pathways in the metabolic 
backbone of the human cell, like Folate or Fatty Acid 
Oxidation or Keratan Sulfate Biosynthesis, play a rele- 
vant role in providing systems' level connectivity to the 
network and connect a number of other specific pathways. 



II. DISCUSSION 

From a broad perspective, a cartographic representa- 
tion of complex networks supposes to map the positions 
of nodes in an underlying geometric space and shares 
some fundamental problems with traditional geograph- 
ical cartography on what concerns techniques, general- 
izations or design: how to represent the topology of the 
mapped network on the metric space, which characteris- 
tics of the network are not relevant to the map's purpose 
and can be eliminated, how to reduce the complexity of 
the characteristics that will be mapped, etc.. Despite 
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the difRculties, cartographic maps based on geometrical 
spaces are crucial to identify dominant nodes, to under- 
stand how diflFerent subparts of the system, like pathways 
in our context, relate to each other, to back up more 
accurate methods of prediction of missing and spurious 
interactions [271 EH], or to find optimal transport routes. 

In our metabolic maps, the astonishing congruency be- 
tween the architecture of metabolic networks and the un- 
derlying geometry is supported by a biochemical interac- 
tion law that, irrespective of the studied organisms, of the 
nature and complexity of the reactions they account for, 
or of the different structural label of the metabolites they 
involve, seems to comply with a simple Newtonian-like 
form and allows us to make predictions about the proba- 
bilities of interaction among sets of metabolites forming 
reactions. Specifically, the sum of the probabilities run- 
ning over all the metabolites participating in a certain 
biochemical reaction can be interpreted as a topologi- 
cal version of the well-known concept of reaction-based 
affinity, and each summand could thus be identified with 
the chemical potential of that particular metabolite in 
relation to its chemical partners in the particular reac- 
tion. Our results point to a systems level definition of 
chemical affinity in terms of network-based probabilities 
of interaction which depend on the distances in the un- 
derlying geometric space and on intrinsic properties of 
nodes which convert some of them in hubs. 

Such probabilistic network-based chemical affinities al- 
low us to recover the established biochemical organiza- 
tion of pathways as connected metabolic families, but at 
the same time raise new questions claiming for the need of 
rethinking its classical definition as self-contained units. 
We find that different pathways may have disparate in- 
ternal structures, some of them being more modular and 
conforming better to the classical definition, while sub- 
units pointing to differentiated functionalities can be dis- 
tinguished in others. We have also unveiled a higher level 
of systems' level interactions represented by metabolic 
backbones, defined on the basis of a quantitative cross- 
talk between pathways. This particular idea advises us 
against the use of very specific biochemical protocols 
aimed to single-out particular pathways as they might 
be prone to underestimate the delicate connections that 
underlay the net and secure its proper functioning. Such 
metabolic features are common to human cells and E. 
coli. However, a comparative study shows that path- 
ways in human metabolism are in general more modu- 
lar and display less overlap of common metabolites with 
other pathways. At the same time the different human 
metabolic families are more entangled and sectors are 
mode difficult to characterize, a possible signature of a 
higher functional complexity or merely a side effect of 
the kind of reconstruction that mixes in a single network 
reactions that happen in diversely differentiated cells. 

Summarizing, in this work we provide cartographic 
maps of two representative metabolisms that capture 
their specific complexities, explaining many of their sys- 
tem properties and provide a new perspective on the def- 



inition, cross-talk, and hierarchical organization of bio- 
chemical pathways. These maps, embedded in a simple 
geometric space, rely on a probabilistic biochemical con- 
nectivity law which emerges from the different physico- 
chemical forces acting at a molecular level and that nat- 
urally conveys a higher interaction likelihood to elements 
which are closer in the underlying space. Similar maps 
for other biological networks are expected to be equally 
congruent and to help to transform data into knowledge 
and knowledge into understanding, paving the way for 
new discoveries in systems biology prediction and con- 
trol. 
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Appendix A: Methods 

1. Hidden metric spaces and the §^ x §^ model. 

The §^ X §^ model can be used as a network generator 
as follows: 

1. Nm metabolites and reactions are homoge- 
neously distributed in a circle of radius R. The 
densities of metabolites and reactions in the cir- 
cle are 5m = N„i/2TrR and Sr — Nr/2TrR, taken 
independent of the network size. Without loss of 
generality, one of them can be set to 1. 

2. Metabolites and reactions are assigned expected de- 
grees km and kr, drawn from the probability den- 
sities Pm{km) and pr{kr), respectively. To model 
metabolic networks, we use pm{km) ~ k^ and 

Pr{kr) = 5{kr — {kr)). 

3. Each possible pair metabolite/reaction is visited 
once and a link is created with probability 

p{km,0m]kr,9r) = pi ), (Al) 

where dmr — R^Omr {^O-mr IS the angular sepa- 
ration) is the distance metabolite/reaction in the 
circle. Function p can be, a priori, any integrable 
function. However, the choice p{x) = (1 + x^)~^ 
generates maximally random networks given the 
constraints of the model. 
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See Appendix B for extended details on the §^ x§^ model. 



2. Inverse problem 

Given a complex network representation, the inverse 
problem of embedding the network in the hidden metric 
space amounts to find the optimal position of every node 
in that underlying geometry. The optimal coordinates 
would ensure that, given the specific form of the connec- 
tion probability in Eq. (B9), the model has a maximum 
probability to reproduce the observed topology. In gen- 
eral terms, the embedding is resolved using statistical 
inference techniques, basically a maximum likelihood es- 
timation in combination with a Monte Carlo method and 
a Metropolis-Hasting rule to explore and select possible 
configurations in the underlying space. More precisely, 
the likelihood functional is defined as 



£ = 



N,. 

nn 

71—1 r—1 



1-p 



(A2) 

where amr is the bipartite adjacency matrix of the net- 
work, defined = 1 if metabolite m participates in 
reaction r and zero otherwise. The bipartite nature of 
metabolic networks together with the fact that reactions 
and metabolites have disparate degree distributions pre- 
cludes to perform the mapping in a single-step. Rather 
the embedding into the §^ x space runs in two phases: 
first the one-mode projection of the metabolic subnet- 
work is embedded into a space following the numeri- 
cal optimization procedures described in [20j , and second 
the inferred angular coordinates of metabolites are used 
as an input to adjust the position of each individual re- 
action in the circle. See Appendix C for a more complete 
description of the x embedding algorithm. 



3. The disparity filter 

To extract the metabolic backbone of cross-talks be- 
tween pathways we apply the multi-scale disparity filter 
defined in |26|. The disparity filter exploits local het- 
erogeneity and correlations among weights in complex 
weighted network representations to extract the network 
backbone by considering the relevant edges at all the 
scales present in the system. It ensures that small nodes 

sum of in- 



j=i— neighbors ^J*' 



in terms of strength [si = ^ 

cident weights to node i) are not neglected and that the 
backbone remains connected and does not disaggregate 
into separate clusters. The methodology preserves in- 
teractions with a statistically significant intensity for at 
least one of the two nodes the edge is incident to. To de- 
cide whether a connection is relevant, the filter compares 
against a null hypothesis which assumes that the local 
weights associated to a node are uniformly distributed 
at random. In this way one discounts intensities that 



could be explained by random fluctuations. The dispar- 
ity filter produces better results in terms of preserving 
the maximum number of nodes and weights in the back- 
bone with the minimum number of links as compared 
to a global threshold filter that selects all the links with 
weights above a certain value, see Fig. 8 in Appendix E. 



4. Average angular position and concentration of 
pathways. 

To find the average angular position of a given path- 
way and a measure of its angular concentration (or dis- 
persion), we use the following method. Each reaction i 
of a given pathway (with i = I, ■ ■ ■ , Np reactions in it) is 
assigned a normalized vector fi pointing to the position 
of the reaction in a circle or radius 1 using as angular 
coordinate the one inferred by our method. The average 
angular position of the pathway is then defined as the an- 
gular coordinate of the average vector (r) = J^f^i ^i/^p- 
We use this method to plot the names of the different 
pathways in Fig. [2] The modulus of the average vector 
|(r) I is a measure of the angular concentration of the re- 
actions. A value |(r)| = 1 means that all reactions in 
the pathway have the same angular coordinates whereas 
I (r) I = indicates a perfect homogeneous distribution 
over the circle. 



Appendix B: The §^ model and its extension to 
bipartite networks 

The model |17j is a complex network generator able 
to generate networks which are, simultaneously, scale- 
free, small-worlds, and highly clustered, as observed in 
the majority of real networks. Nodes in this model are 
distributed in a metric space (in the simplest case a one- 
dimensional circle) abstracting (di) similarities among the 
elements of the network. The model generates net- 
works according to the following steps: 

1. Distribute N nodes uniformly over the circle §^ of 
radius N/{2tt), so that the node density on the cir- 
cle is fixed to 1. 

2. Assign to all nodes a hidden variable k representing 
their expected degrees. To generate scale-free net- 
works, K is drawn from the power-law distribution 

p{k) = k2;"^(7 - 1)k~'', Kg[Ko,oo), (Bl) 
,7-2 



Kq 



(fc) 



7-1' 



(B2) 



where kq is the minimum expected degree, and (k) 
is the network average degree. 

3. Let K and k' be the expected degrees of two nodes 
located at distance d = NA9/{2Tr) measured over 
the circle, where A0 is the angular distance between 
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the nodes. Connect each pair of nodes with prob- 
ability p{x), where the effective distance is defined 
as deff = d/{fiKK'), and fj, is a constant fixing the 
average degree. 

The connection probabihty p{x) can be any integrable 
function. Here we chose the Fermi-Dirac distribution 



p{x) 



1 



1 



(B3) 



where /3 is a parameter that controls clustering in the 
network. With this connection probability, parameter /j, 
becomes 



27r(fc) 



The expected degree of a node with hidden variable k is 
= K and, therefore, the degree distribution scales as 
P{k) ^ k"'' for large k. Notice that this is the reason 
why in the main text we use degrees instead of expected 
degrees. 



1. The X §^ model 

The model can be extended to bipartite networks 
as follows: 

1. N„i metabolites and Nr reactions are homoge- 
neously distributed on a circle of radius R. The 
density of metabolites and reactions over the cir- 
cle are then 5m = Nm/^nR and — Nr/lnR. 
These two densities remain constant in the ther- 
modynamic limit so that the radius of the circle is 
proportional to the number of metabolites or reac- 
tions. 

2. Each metabolite is assigned a hidden variable Km 
and each reaction a hidden variable n^. These ran- 
dom variables follow probability densities Pmi^^m) 
and pri^r), respectively. 

3. The connection probability between a reaction with 
hidden variable Hr and a metabolite with hidden 
variable separated by a distance dmr = R^(^mr 
{A9mr being the angular separation) is given by 



Pi^^rm Qm] dr) — P 

which can be any integrable function 



(B5) 



Using the formalism developed in [3^, we compute 
the average degree of a metabolite with hidden variable 
Km (notice that since the angular distribution is homo- 
geneous, this quantity does not depend on the angular 
coordinate of the metabolite and so we chose one that is 
at 6m — 0) as 



kmi^m) — I dKj.p,.[Kr) 



1 

'2n 



dOp 



(B6) 



Analogously, the average degree of a reaction with hidden 
variable Kr is 

dKmPr{Km)7r- / d9p i 

(B7) 

By doing the change of variables x — ^^^]^ and taking 
the thermodynamic limit i? — > (X), we can write 

kmiKm) = 2p,drI{Kr)Km, (B8) 



kr{Kr) = 2fi6ml{ 



(B9) 



(B4) where / — dxp{x). By taking the average again 



{km) — 2.p,5rl {k^} {Km) , 



(BIO) 



{kr) = 2pSmI{Km){Kr)- (Bll) 

We immediately see that the following relation holds 



(fc„ 



{kr 



Sr _ Nr 
Sm. Nm 



(B12) 



In terms of the average degrees, parameter takes the 
form 



{k„ 



{kr 



26rI{Kr){Km) 26md {Kr) {Km) 



(B13) 



and, therefore, Eqs. (|B8|) and (|B9|) can be rewritten as 

(B14) 



km ( ^ni ) 



(B15) 



We always have the freedom to chose the averages of the 
hidden variables Km and Kr to coincide with the actual 
averages of the observable variables km and kr, that is, 
{km) — {nm) and {kr) — {Kr). In this case we can write 

km{^i7i) ~~ ^171 and kr{Kr) —— Kr (B16) 
with parameter /i 

1 1 



2SrI{Kr) 2SmI {Hm) 



(B17) 



This is the choice that we shall follow in the rest of the 
text. The degree distributions can now be easily written 
as 



Pm{km) I dKmPm{^m) -I ^^rn ^ 

film • 



Pr (kr) 



dKr Pr^Kr) , K^ C ^ 
kr'- 



(B18) 



(B19) 
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2. Specific model for metabolic networlcs 

In the case of metabolic networks, the distribution of 
metabolites' degrees is a power law with exponent 7 « 2.6 
and the distribution of reactions' degrees is Poisson-like. 
We can generate this type of network by chosing 

Pm{Hm) = {l~'^)l^l-nll^Vn ^ith K„ > K„j,o = ^(Km) 

7-1 

(B20) 

and 

PriHr) = S{Kr — {Kj.)). (B21) 

Reaction degrees are then Poisson distributed, that is, 

1 



whereas the degree distribution of metabolites is 

We also chose the connection probability 

p{x) 



(B22) 



(B23) 



(B24) 



so that the integral / = 7r/(/3 sin (7r//3)). We can also 
chose Sm ~ 1 without loss of generality. Therefore, the 
number of relevant (free) parameters of the model are 
(Kr), (Km), P, and 7. 



3. Parameters estimation and finite size effects 

All results in the previous section are strictly true in 
the thermodynamic limit. In finite size networks, some 
of the expressions have to be corrected by size dependent 
factors as we will show below. Besides, there is an extra 
complication due to the fact that this model can generate 
nodes with zero degree, which are never observed in a real 
network. 

Suppose we are given a real network with A'^f * metabo- 
lites and N°'"' reactions and average degrees {km)"'"' and 
{kr)°'^^ with exponent 7. We now want to estimate the 
values of {k^), {Km), Nm and iV^ in our model. The first 
complication arises because in our model, out of the Nm 
nodes, there is a fraction PmiO)Nm nodes with zero de- 
gree that cannot be observed. Therefore, if we observe 
N^"^ metabolites, the best estimation of Nm is 



Nm = 



n: 



obs 



l-Pm(O) 



and, analogously 



Nr 



N: 



obs 



l-Pr{0) 



(B25) 



(B26) 



The second complication is due to the fact that the av- 
erage degree of a power law distribution strongly depends 
on the maximum degree observed in the sample. For in- 
stance, in the case of our Pmi^m) = (7 — ^)'^Zioi'^^, 
if the sample is finite, the distribution is truncated at a 
certain value Km,c that, typically, increases with the size 
of the sample. If we compute the average of Pm{i^m) but 
only up to the maximum Km observed, we have 



{Km{Km,c)) — (7 l)'*m.O 



Kl;;^dnm (B27) 



and so 



{Kmi^m.c)) — {l^m) 1 



7-2N 



(B28) 



Notice that this large parenthesis converges to 1 in the 
thermodynamic limit but for 7 « 2 it can be fairly large 
even for large systems. Let us call this factor a{Km.c), 
that is. 



Oi{Km,c) 



1 - 



(B29) 



Now we need to keep track of the finite size effects from 
the very beginning. This means that we have to correct 
Eqs. (IBSI) and ^B9\ as follows 



kf^Kif, Kmc) — ^(,Km,c) Kr 

and taking averages 

{km(Kmc)) — Oi{Km.c){Km) 
{kj.{Kmc)) — Oi{Km,c){Kr) 



(B30) 
(B31) 

(B32) 
(B33) 



Notice that, to write these set of equations we have used 
that variable Kr is not power law distributed. 

Still, this average {km{Km,c)) cannot be directly iden- 
tified with the measured average degree because it also 
accounts for nodes of zero degree. To correct for this 
effect, we write 



06s _ (fcm (^m,e)) 

P™(0) 



{km)°'' = \ 



and so 



{Km) — , \ {km) 



and analogously 



{Kr) 



a{Km,c) 



a(Km,c) 



(B34) 



(B35) 



(B36) 
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FIG. 6: Empirical vs. model degree distributions 

Complementary cumulative degree distribution (defined as 
Pc(k) — X]fc'=fc^(^)) of metabolites and reactions degrees 
for the E. coH and human metabohsm as compared to two 
networks generated with the model using the parameters in 
the text. 



with 



-Pm(O) = (7 - l)'«m,0 r(l - 7, K,n,o) 



PriO) 



-tl('im,c)(H:^) 



l,max,obs 



(B37) 

(B38) 
(B39) 



Plugging Eqs. (B20), (B29l, (B37l, and (B39l into Eq. 



(B35), we obtain a closed equation for (k™) that can be 
solved numerically. Once this parameter is known, by 
inserting it into Eqs. (B29) and (B37l we obtain the 



values of a(K„j_c) and P„i(0). Finally, with the value of 
ck(Km,c) and Eqs. (B36l and (B38) we get the values of 
(k^) and Pr{0). 

4. Parameters of the real metabolisms 

Using information from the BiGG database [231 
we build bipartite metabolic network representations of 



the two analyzed metabolisms, E. coli and human, avoid- 
ing reactions that do not involve direct chemical trans- 
formations, such as diffusion and exchange reactions. 
The bipartite representation differentiates two subsets 
of nodes, metabolites and reactions, mutually intercon- 
nected through unweighted and undirected links, without 
self-loops or dead end reactions. In particular, we ana- 
lyze the iAF1260 version of the K12 MG1655 strain of 
the metabolism of E. coli [3T], and the existing anno- 
tated list for human metabolism [22]. For the sake of 
simplicity and to enhance the resolution of the applied 
algorithm, currency metabolites are eliminated, alto- 
gether with a few isolated reaction-metabolite pairs and 
reaction-metabolite-reaction triplets. For E. coli, this 
leads to a final set of 1512 reactions and 1010 metabolites 
while human metabolism is nearly 3/2 larger, with 2201 
reactions and 1482 metabolites. Gharacteristic power- 
law degree distributions for metabolites are readily iden- 
tified in both organisms, with exponents that are rather 
similar, respectively 2.65 for E. coli and 2.55 for human. 
Reactions, meanwhile, conform to Poisson-like distribu- 
tions, whose average values are 2.77 and 2.93 respectively. 
We used the software "Pajek" to elaborate all network 
representations in this paper figures'. 

• To find the parameters of the E. coli metabolic net- 
work, we use a version of the network where differ- 
ent isomers are considered as different metabolites. 
Further, we remove the following currency metabo- 
lites: h-841, h2o-694, atp-338, pi-308, adp-260, ppi- 
129, nad-115, nadh-109, amo-85, nadp-83, nadph- 
81. Ten isolated metabolite-reaction pairs and six 
isolated reaction-metabolite-reaction triplets have 
also been removed. For this network, we mea- 
sure iV°f'^ = 1010, N°'"' = 1512, {k,n)"'"' = 4.15, 
and (/cr)°'"' = 2.77. Using the formalism de- 
scribed in the previous section, we obtain the fol- 
lowing estimation of the parameters: {Km) — 4.06, 
(Kr) = 2.65, = 1123, and Nr = 1720, and 
R = = 178.7. 

• In the case of the Human metabolism, the removed 
currency metabolites are: h-1250, h2o-916, atp- 
309, coa-277, pi-240, adp-237, o2-212, nadp-210, 
nadph-207, nad-202, nadh-195, ppi-114. Three iso- 
lated metabolite-reaction pairs have also been re- 
moved. We then measure N^" = 1482, N°^' = 
2201, (fc™)°^" = 4.34, and = 2.93, which 
leads to the following estimation of the parame- 
ters: {Km) = 4.22, {Kr) = 2.73, Nm = 1646, and 
Nr = 2326, and R = Nm./2n = 235.9. 

In Fig. § we show the degree distributions for both 
E. coli and human metabolisms and compare them with 
those corresponding to networks generated by the x §^ 
model. The exponent /3 takes the value /3 = 1.3 in both 
networks. The agreement between the model and the real 
metabolic networks is very good for metabolites. How- 
ever, the model overestimates the probability of reactions 
involving five or more metabolites. 
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Appendix C: Embedding algorithm and validation 
on X §^ synthetic networks 

Once the parameters (Kr), (wm), /?! and 7 are esti- 
mated, we perform the embedding of the bipartite net- 
work to infer the angular coordinates of metaboUtes 
and reactions. Let A = {aij)N^xNrj * = Ir'' 
J = 1, • • • , Nj., be the adjacency matrix of the network, 
defined as aij = 1 if metabolite i participate in reaction 
j and zero otherwise (in the rest of the text, symbol i 
is reserved to enumerate metabolites and symbol j to 
reactions). Our goal is to find the set of coordinates 
{Km,i,(^m,i,0r,j} that bcst match the x S"'^ model in 
a statistical sense. To this end, we use maximum likeli- 
hood estimation (MLE) techniques. Let us compute the 
posterior probability, or likelihood, that a network given 
by its adjacency matrix A is generated by the x 
model, £(A). This probability is 



■ Y[ d9m,idKmA W d9rj, (CI) 
i=l j=l 

where function C{A, {Km,,i,dm,i,dr,j}) within the inte- 
gral is the joint probability that the model generates 
the adjacency matrix A and the set of hidden variables 
{Km,i,9m,i,dr,j} simultaneously. Using Bayes' rule, we 
can compute the likelihood that nodes' coordinates take 
particular values {Km,i, 0m,i, &r.j} given the observed ad- 
jacency matrix A. This probability is simply given by 



C{A,{k. 



£(A) 

Prob({Km,i; G„ii, J })i2(A| {KtojI I 6m,ij ^rj}) 

£(A) 



.(C2) 



where 



N„ 



Prob({Km,i, ^m,i, ^rj }) — J_ J_ P"^ ('^m,i) 

(C3) 

is the prior probability of the hidden variables given by 
the model, 

N,. 

1=1 j=i 

(C4) 

is the likelihood of observing A if the hidden variables 

are {/^m,ii ^m,?! ^r,j}; 



NrA9ij 



/3sin {■n/P)K.jn/ 



A% ^TT-\TT-\e, 



(C5) 



(06) 



and p{x) is given by Eq. (B24| 



The MLE values of the hidden variables 
{^-m iT^m ii^r j\ ^re then those that maximize the 
likelihood in Eq. \G2\ or, equivalently, its logarithm. 



N„ 



\TLC{{K.mA,BraS,Or^])\A) = C - 7 ^ In K^^i -|- 

1=1 

+ {a„- \np{x,j) + aij) In [1 - p{x,j)]} , (07) 

i=i j=i 

where C is independent of the nodes' coordinates 

{^m,i5 9m^i-, ^r^j^- 



1. MLE for expected metabolites' degrees «:„ 



degree K„i^i of metabolite I is 
d 



The derivative of Eq. (071 with respect to expected 



9k„ 



7 



/3 



(08) 



The first term within the parenthesis is the expected de- 
gree of metabolite I, while the second term is its actual 
degree k^.i- Therefore, the value k* ^ ^ that maximizes 
the likelihood is given by 



1 



(09) 



Since can be smaller than kq in the last equation, we 
set 



7-2 



(CIO) 



MLE for angular coordinates 

Having found the MLE values for expected deg rees t^rn , 
we now have to maximize Eq. ( 02 1 with respect to an- 



gular coordinates. This task is equivalent to maximizing 
the partial log-likelihood 

ln£(A|{<^„0„,„0,^,}) = 

X! X! ■f^^-J + (1 ^ ) In [1 - p{xij)]} ■ (Oil) 

i=i j=i 



The maximization of Eq. (Oil) with respect to the an- 



gular coordinates cannot be performed analytically and 
we have to rely on numerical optimization procedures. 
Unfortunately, the low degrees of reactions implies that 
any attempt to maximize Eq. (Oil) directly is doomed 
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FIG. 7; Calibration of the embedding algorithm. The left plot shows the inferred angular coordinates of metabolites and 
reactions vs. the real ones of a network generated with the x with the same parameters as the real metabolism. The right 
plot shows the empirical connection probability obtained from the embedding compared to the theoretical one in Eq. ( B24 1 



to fail. Indeed, the uncertainty in the position of a low 
degree reaction is necessary very high. This, in turn, 
increases the uncertainty in the position of its metabo- 
lites' neighbors, which translates into global uncertainty 
in the localization of nodes and metabolites. We there- 
fore adopt a different strategy. Starting from the original 
bipartite network, we construct its one mode projection 
over the space of metabolites, that is, we consider only 
one type of nodes (metabolites) and declare two metabo- 
lites as connected if they participate in the same reaction 
in the original bipartite net. If metabolites are power-law 
distributed in the bipartite network, the obtained unipar- 
tite network is also power-law distributed with the same 
exponent. This solves the problem mentioned above be- 
cause, now, high degree nodes can be located with high 
accuracy so that we can use afterwords these nodes as a 
template to find the coordinates of the rest of the nodes. 

We find the angular coordinates of metabolites by fit- 
ting the one-mode projected network using the §^ model 
as described in [2^ . Once the angular coordinates 6*-^ ^ 
are known, we find the optimal angular coordinates of 
reactions by maximizing Eq. (Cll I but using the already 
known coordinates of metabolites as fixed inputs. This 
final maximization is a simple procedure because, being 
j fixed, we can maximize the likelihood of each reac- 
tion independently. 

We first test the described procedure in synthetic net- 
works generated by the x §^ model with the same 
parameters as the real E. coli metabolism. Results are 
shown in Fig. [Tj The left plot shows the inferred angles 
for metabolites and reactions vs. the real ones. As it can 
be clearly seen, up to minor fluctuations and a global 
phase shift due to rotational symmetry of the model, 
the agreement between the real coordinates and those 
inferred by the algorithm is very good. The right plot 
shows the connection probability using the inferred coor- 
dinates vs. the one used to generate the model Eq. ( B24 1 . 



Again, the agreement between the two is excellent. 



Appendix D: Classification of pathways in E. coli 
depending on localization 

See Table I. 



Appendix E: Pathways crosstallc and the disparity 
filter 

We use the following measure of crosstalk between 
pathways: 



XT, 



PaPb 



E E E(^(^ 



^ Pi^ij'))\ohseYved links' 
(El) 

where u E Mab the set of metabolites shared by the 
reactions in the two pathways Pa and Pf,, and only prob- 
abilities of connections associated to observed links are 
considered. 

Of 561 possible pathway pairs in E. coli, 460 are non- 
zero crosstalk (82.00%) with a minimum value of 1.80 
and a maximum of 159.91. In human cells, of 4278 pos- 
sible pathway pairs, 1689 are non zero (38.64%) with a 
minimum crosstalk of 1.19 and a maximum of 131.28. 
Moreover, there is an isolated pathway (48, Limonene 
and Pinene Biosynthesys) without crosstalk (no common 
metabolites with other pathways). So, at this level hu- 
man cells metabolism seems to be more modular than E. 
coli's. 

The obtained pathway crosstalk matrices are filtered 
to obtain backbones according to the multiscale method- 
ology in |26) . which do not belittle small pathways and 
gives an effective tradeoff between maximum weight and 
nodes in the backbone with the minimum number of 
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TABLE I: Classification of E. cofi's pathways. Pathways are classified as "localized" (75% of the pathway localized in a single 
bin), "bimodal" (75% of the pathway localized in two bins) "multi-peaked" (75% of the pathway localized in three bins or 
more with at least one peak above 25%), and "transversal" (no bin above 25%) according to the results and bin size of Fig. [s] 
Pathways in italics indicate that, although they are split in two or three bins, these bins are adjacent and so a change in the 
bin resolution would lead to their redefinition as more localized pathways. 
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FIG. 8: Disparity backbone vs global threshold back- 
bone. 



where k is the degree of node i. By changing the sig- 
nificance level, we can filter out the links progressively 
focusing on more relevant ones. As a result, the dispar- 
ity filter reduces significantly the number of edges in the 
original network, while keeping almost a large fraction 
of the total weight and the total number of nodes. It 
preserves as well the cutoff of the degree distribution, 
the form of the weight distribution, and the clustering 
coefficient. 



Appendix F: Results for human cells metabolism 



links. A global threshold filter would lose many more 
nodes for the same number of links and weight in the 
backbone, see Fig. |8] 

The disparity filter methodology preserves interactions 
with a statistically significant intensity for at least one of 
the two nodes the edge is incident to. To decide whether 
a connection is relevant, the filter compares against a null 
hypothesis which assumes that the local weights associ- 
ated to a node are uniformly distributed at random. In 
this way one discounts intensities that could be explained 
by random fluctuations. More specifically, a p value -the 
probability aij that if the null hypothesis is true one ob- 
tains a value for the normalized weight Wij/si between 
nodes i and j larger than or equal to the observed one- is 
calculated for each edge in the network. By imposing a 
significance level a, the links that carry weights that can 
be considered not compatible with a random distribution 
can be filtered out with a certain statistical significance. 
Links in the backbone will be then those which satisfy 



l-(fc-l) / 



Wij I Si 



(1 - xf^'^dx < a, (E2) 



In Fig.[9j we show the embedding representation of hu- 
man cells metabolism. In Fig. 10 we show the angular 
distribution on the ring of the whole list of pathways eval- 
uated from the circle-based embedding of the reactions 
they involve. 
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FIG. 9: Human metabolism map. Yellow circles represent reactions whereas blue squares are metabolites. For each 
metabolite, the symbol size is proportional to the logarithm of the degree and radially placed according to the expression 
r = R — 21nfcm. Black (grey) connections are those that according to the model have a probability of existence larger (smaller) 
than 0.5. 
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FIG. 10: Angular distribution of pathways for the human metabolism. The whole angular domain [0, 360°] is divided 
in 50 bins of 7, 2° each and for each bin we compute the fraction of reactions of the pathway in it. Each pathway is shown in a 
different graph. Different colors indicate different metabolic famiUes. Panel I: black for Amino Acids metabolism (numbering 
the graphs from left to right and from top to bottom, 1-14), red for metabolism of Other Amino Acids (15-21), dark green for 
Nucleotide metabolism (22-28), turquoise for Energy metabolism (29,30), purple for biosynthesis of Other Secondary Metabolites 
(31-34), brown for miscellaneous and others (35,36). Panel II: orange for Carbohydrate metabolism (1-16), blue for metabolism 
of Cofactors and Vitamins (17-30), violet for Transport pathways (31-33), light green for Xenobiotics Biodegradation (34). 
Panel III: orange for Glycan metabolism (1-11), and dark brown for Lipid metabolism (12-24). Pathway names have been 
abbreviated in standard forms whenever possible. 
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